Empirical Dynamic Programming

نویسندگان

  • William B. Haskell
  • Rahul Jain
  • Dileep M. Kalathil
چکیده

We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get ‘empirical value iteration’ (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get ‘empirical policy iteration’ (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bankruptcy Prediction: Dynamic Geometric Genetic Programming (DGGP) Approach

 In this paper, a new Dynamic Geometric Genetic Programming (DGGP) technique is applied to empirical analysis of financial ratios and bankruptcy prediction. Financial ratios are indeed desirable for prediction of corporate bankruptcy and identification of firms’ impending failure for investors, creditors, borrowing firms, and governments. By the time, several methods have been attempted in...

متن کامل

Biomechanical Investigation of Empirical Optimal Trajectories Introduced for Snatch Weightlifting

The optimal barbell trajectory for snatch weightlifting has been achieved empirically by several researchers. They have studied the differences between the elite weightlifters’ movement patterns and suggested three optimal barbell trajectories (type A, B, and C). But they didn’t agree for introducing the best trajectory. One of the reasons is this idea that the selected criterion by researchers...

متن کامل

Modern Computational Applications of Dynamic Programming

Computational dynamic programming, while of some use for situations typically encountered in industrial and systems engineering, has proved to be of much greater significance in many areas of computer science. We review some of these applications here.

متن کامل

A dynamic programming approach for solving nonlinear knapsack problems

Nonlinear Knapsack Problems (NKP) are the alternative formulation for the multiple-choice knapsack problems. A powerful approach for solving NKP is dynamic programming which may obtain the global op-timal solution even in the case of discrete solution space for these problems. Despite the power of this solu-tion approach, it computationally performs very slowly when the solution space of the pr...

متن کامل

A dynamic bi-objective model for after disaster blood supply chain network design; a robust possibilistic programming approach

Health service management plays a crucial role in human life. Blood related operations are considered as one of the important components of the health services. This paper presents a bi-objective mixed integer linear programming model for dynamic location-allocation of blood facilities that integrates strategic and tactical decisions. Due to the epistemic uncertain nature of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Math. Oper. Res.

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2016